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In the standard structure formation scenario based on the cold dark matter paradigm, galactic 
halos are predicted to contain a large population of dark matter subhalos. While the most massive 
members of the subhalo population can appear as luminous satellites and be detected in optical 
surveys, establishing the existence of the low mass and mostly dark subhalos has proven to be 
a daunting task. Galaxy-scale strong gravitational lenses have been successfully used to study 
mass substructures lying close to lensed images of bright background sources. However, in typical 
galaxy-scale lenses, the strong leasing region only covers a small projected area of the lens’s dark 
matter halo, implying that the vast majority of subhalos cannot be directly detected in lensing 
observations. In this paper, we point out that this large population of dark satellites can collectively 
affect gravitational lensing observables, hence possibly allowing their statistical detection. Focusing 
on the region of the galactic halo outside the strong lensing area, we compute from first principles the 
statistical properties of perturbations to the gravitational time delay and position of lensed images 
in the presence of a mass substructure population. We find that in the standard cosmological 
scenario, the statistics of these lensing observables are well approximated by Gaussian distributions. 

The formalism developed as part of this calculation is very general and can be applied to any halo 
geometry and choice of subhalo mass function. Our results significantly reduce the computational 
cost of including a large substructure population in lens models and enable the use of Bayesian 
inference techniques to detect and characterize the distributed satellite population of distant lens 
galaxies. 


I. INTRODUCTION 

Dark matter forms the gravitational backbone of most of the observed structures in the Universe, from the largest 
galaxy clusters to the faintest dwarf galaxies. Despite this ubiquity, the nature of dark matter remains a mystery. 
On the one hand, the cold dark matter (CDM) paradigm has been extremely successful at describing observations 
on large cosmological scales such as the cosmic microwave background [1], the clustering of galaxies [2], and cosmic 
shear measurements [3]. On the other hand, this success constitutes a mixed blessing since there is a vast array of 
particle candidates that naturally fall under the CDM umbrella on large cosmological scales. One possible avenue to 
distinguish between this plethora of models is to look on much smaller length scales where clues of the particle nature 
of dark matter are more evident. For instance, the initial free-streaming of warm dark matter particle would damp 
the growth of structure on small scales HHII] while the fluctuation power spectrum of dark matter that couples to 
relativistic species until late times would display both acoustic oscillations and collisional damping P^l - fT7] . On the 
other hand, if these physical phenomena are absent in the dark matter sector, standard structure formation theory 
predicts that galaxies should harbor a very large number of dark satellites [iHl IHl ■ The statistical detection of these 
numerous dark subhalos would validate a key prediction of standard CDM theory. 

Strong gravitational lensing provides a way to probe the distribution of dark matter on the smallest scales . 

For instance, the observations of flux-ratio anomalies in multiply imaged lensed quasars have been used to study 
of abundance of mass substructures within the lens galaxy [Ml [Ml [ZS ESI UHl EH EiH55] . More recently, direct 
gravitational imaging EH EHl ESI EH ED has enabled the detection of massive substructures along magnified arcs and 
Einstein rings. Similarly, resolved spectroscopy of strongly lensed dusty star-forming galaxies has been proposed to 
study mass substructures within the lens galaxy [241 |88l |89] . By construction, these techniques are most sensitive 
to substructures lying close to lensed images, that is, substructures that appear close in projection to the Einstein 
radius of the lens. Since the typical Einstein radius of a galaxy-scale lens is a small fraction of its virial radius, only a 
small number of mass substructures are on average projected close to the region probed by strong lensing [31]. One 
therefore naturally expects that an order unity number of mass substructures could be detectable in each individual 
lens. Meaningful constraints on mass substructure inside lens galaxies can then be obtained by considering a sample 
of galaxies as was recently done in Ref. [M] . 
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While the vast majority of mass substructures in lens galaxies cannot be directly detected in lensing observations, 
the collective effect of substructures far from lensed images can nevertheless cause small perturbations to lensing 
observables that can be statistically detected. For instance, Ref. m studied how astrometric perturbations could be 
used to probe mass substructures, while Ref. [3D] used both astrometric and magnification perturbations to constrain 
the presence of dark clumps within the lens HE0435-1223. In addition, time-delay fluctuations in multiply imaged 
lensed quasars have been proposed [42| as a tool to characterize broad properties of mass substructures within lens 
galaxies. Certainly, the overall population of mass substructures will perturb all lensing observables in a coherent and 
correlated way. 

In this manuscript, we develop a formalism to study stochastic millilensing from a large population of unresolved 
mass substructures inside the halos of galaxies acting as strong gravitational lenses. The aim of this formalism is 
to compute the joint effect of mass substructures on all lensing observables (image positions, magnifications, time 
delays), taking into account all possible correlations among those. Our work builds on the theory of stochastic stellar 
microlensing [DOllIOOj and generalizes the results of Refs. |38l SOI SII- We focus our analysis on multiply imaged 
point sources (e.g. quasar lenses) since these are the most relevant objects where gravitational time delays can in 
principle be measured. As we discuss below, time-delay measurements are crucial in probing the satellites populating 
the outskirts of distant lens galaxies. 

As in some of the stellar microlensing works, we use Markov’s method (see e.g. Ref. [inij l to compute from first 
principles the probability distribution of lensing potential and deflection perturbations in the presence of a population 
of mass substructures inside the lens galaxy. By performing an Edgeworth expansion |I02j , we show that for a realistic 
structure formation scenario the probability distributions are nearly Gaussian. We also compute the leading order 
deviations from pure Gaussianity. The advantage of our analysis is that it allows one to determine which physical 
quantities control the behavior of lensing observables in the presence of mass substructures. This dependence on 
physical parameters is often obscured in studies relying purely on numerical simulations. By removing the need to 
perform such simulations, our approach has the potential to significantly speed up the analysis of substructures inside 
lens galaxies, and provide a convenient way to explore degeneracies between the macrolens and the substructure 
population. Most importantly, it provides a practical way to statistically detect dark satellites inside lens galaxies, 
hence providing a key test of standard cold dark matter theory. 

In this paper, we focus on analyzing the effect of mass substructures that are spatially located beyond a few Einstein 
radii. We leave the analysis of local mass substructures that are spatially located close to lensed images to future work. 
This paper is organized as follows. In Sec.jnj we describe the challenge of statistically detecting a population of unre¬ 
solved mass substructures and explain our approach to tackle this problem via the use of the characteristic function. In 
Sec. Ill we justify the division of the overall substructure population into two subpopulations (distributed and local). 


and we perform the actual calculation of the characteristic function for a population of distributed substructures. We 
show that in the cases of interest its behavior is quasi-Gaussian, and we discuss in which situations non-Gaussianities 
can become important. We also compare our results to the output of Monte Garlo realizations. We then show in 
Sec. |IV| how our approach can be used to marginalize over the distributed substructure population. We discuss which 
physical properties of the distributed substructure population are most relevant to the lensing observables in Sec. |V| 
and we finally conclude in Sec. |vg 


II. STOCHASTIC LENSING: GENERAL CASE 

In this section, we present the general ideas behind our approach to substructure lensing. After brief remarks 
about our setup and notation, we introduce the challenges of lens modeling in the presence of a stochastic population 
of mass substructures. We then present the basic ideas behind our analytical stochastic lensing framework and 
derive important results regarding the joint distribution of gravitational lensing observables. These results are used 
throughout the rest of this paper. 


A. Setup and notation 

For definiteness, we consider a situation where a high-redshift point-like source (such as a quasar) is multiply 
imaged by a massive foreground galaxy whose properties are described by a set of parameters Qgai- For instance, Qgai 
could contain information about the lens Einstein radius, the dark matter and stellar density profiles, their ellipticity, 
etc. The characteristics of the main lens can also depend on the fundamental properties of dark matter (denoted 
by the parameters Qdm) such as its free-streaming length (Afs), its sound horizon (tdao)) its self-interaction cross 
section (ctsidm); and its temperature of kinematic decoupling (T^d). In addition, the main lens galaxy lives in a local 
environment characterized by parameters Qenv which contain information, for instance, about the external shear and 
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Parameters 

Description 

Dependency 

Example 

Qcos 

Cosmological parameters 

- 

^A? -^S? ^s 

qoM 

Dark matter parameters 

Qcos 

Afs, ruAO, USIDM, Tkd 

Qenv 

Lens environment parameters 

Qcos 

External shear, positions and mass of nearby galaxies 

qgai 

Macrolens parameters 

Qcos? QDM? Qenv 

Lens Einstein radius, ellipticity, density prohle 

Qios 

Line-of-sight structure parameters 

Qcos? Qdm 

Nonlinear matter power spectrum parameters 

Qsub 

Substructure population parameters 

Qcos? Qdm? Qgai 

Substructure mass function and their spatial distribution 

Csub 

Individual substructure parameters 

Qsub 

Positions, masses and concentrations of each subhalo 


TABLE I. Summary of our notation for the different sets of parameters relevant to our gravitational leasing analysis. The 
third column indicates the relative dependency of the different sets of parameters while the last column gives examples of the 
different types of parameters. See main text for more details. 


the properties of nearby luminous galaxies. We parametrize the line-of-sight structures (that is, exterior to the main 
lens plane) between the source and the observer via an array qios- Of course, all of these different sets of parameters 
have a dependence on the background cosmology, which we denote as qcos = {^^o, tig}, where Hq is the 

Hubble parameter, and Ha are the energy densities in matter and dark energy, respectively, in units of the critical 
energy density of the Universe, Ag is the amplitude of the primordial power spectrum of scalar fluctuations, and rig is 
the scalar spectral index. Throughout this paper, we use the numerical values for the cosmological parameters from 
the Planck 2015 data release [im]. 

In addition to the spatially smooth component described by qgai, the lens galaxy also contains mass substructures, 
the most luminous of which can appear as satellites orbiting the main lens. We collect the individual properties of 
these substructures in an array Cgub, which could, for instance, contain information about the position, virial mass, 
and concentration of each substructure. Finally, we assume that individual substructure properties are sampled 
from an underlying probability distribution parametrized by an array qgub which encodes information about the 
substructure mass function, their spatial distribution within the lens, and their mass-concentration relation, which 
has a strong dependence on the parameters contained in qoM • We summarize our notation in Table |l] and indicate 
the interdependency of these different sets of parameters. 

We reiterate that our goal is to use gravitational lenses to constrain the substructure population parameters qgub 
and then use this information to learn new insights about the microphysics of dark matter encoded in qDM- Of course, 
determining the impact of a given choice of qcos, qDM, and qgai on the population parameters qgub is highly nontrivial 
and requires detailed numerical simulations. This is a very active area of research and tremendous progress has been 
made in the last decade |104H120] . In this work, we are interested in developing a formal understanding of the impact 
of a given choice of qgub on lensing observations. We defer to future work the interpretation of given qgub constraints 
in terms of dark matter microphysics. 


B. Stochastic lensing by unresolved substructures: Main challenge 


In this subsection, we review the challenges of lens modeling in the presence of unresolved mass substructures. 
Let us imagine that we have a data vector d. In general, d could include the position and surface brightness of the 
multiple images of the source, the time delays between the images, the velocity dispersion of the lens, and other data 
about the lens environment (external shear and convergence). Using these data, we would like to jointly constrain the 
arrays of parameters q = {qgai, qenv, qios} and Cgub- Given a choice of these parameters, one can compute the theory 
observables t(q, Cgub) (encompassing, for instance, image magnifications and positions, as well as relative time delay 
between lensed images) and use them to compute the likelihood of measuring d, .if(d|q, Cgub)- The problem with the 
above procedure is that a given dark matter theory does not predict the positions and masses of individual subhalos 
within the lens galaxy. The fundamental dark matter physics rather makes predictions about the statistical properties 
of subhalos (described here by qgub) such as their mass distribution, their concentration and their spatial distribution 
within the lens. Therefore, the elements of the array Cgub are nuisance parameters that need to be integrated out. 

One could however sidestep this issue by directly specifying the statistics of the substructure population via the 
set of parameters qgub, without having to explicitly draw random realizations Cgub- The immediate problem with this 
approach is that the theory observables are no longer unambiguously specified. Instead, for each choice of substruc¬ 
ture population parameters qgub, one obtains a probability density function for the theory observables P(t|q, qgub)- 
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Formally, this probability density can be written as 

-^(^iQjQsub) — ^suh{^suh\^suh')^Y) t(cj, Cgub)) dCgub; (1) 

where Psub(csub|qsub) is the probability of having a mass substructure population specified by Cgub, given a choice of 
population parameters Qgub, and where (5^ is the fc-dimensional Dirac delta function {k is the length of the t vector). 
Once P(t|q, qgub) is specified, the likelihood of the data d now takes the form, 

.if(d|q,qsub) = J dtP(t|q, qgub)^(d|t), (2) 

where .if(d|t) is the likelihood of the data given the theory observables. Note that if we substitute Eq. Q into 
Eq. § , we obtain 

“^(^Iq^qsub) — ^ .fgub (Csub I qsub)°^(d|q, Cgub)dCgub 5 (3) 

which is just the standard marginalization over the substructure population. Once .jSf(d|q, qgub) is known, it is straight¬ 
forward to construct the desired posterior distribution P(q, qgub|d) oc .if (d|q, qsub)n(q, qsub), where n(q, qgub) is the 
prior probability distribution for q and qgub- 

The above calculation of P(q, qsub|d) hinges on the accurate determination of the likelihood if (d|q, qgub), either 
through Eq. ([^, or directly through Eq. Let us for now focus on the latter approach which has been used 

in various works on mass substructure inside gravitational lenses HOI El El Ei [HI [SUES]- If one could rapidly 
draw a large number of substructure realizations Cgub from the distribution Psub(csub|qsub) and compute the theory 
observables t(q, Cgub) for each of those, one could then replace the integral in Eq. ([^ by a sum of all the realizations 

if(d|q,qgub) if (d, t(q, Cgub)), (4) 

Csub~-Psub (^sub ) 

where the notation Cgub -Psub(qsub) means that Cgub is drawn from the distribution Pgub(qsub)- This approach has 
several drawbacks. First, it is difficult to assess how many realizations are needed to properly estimate the likelihood. 
A related issue is how to identify the substructure realizations that contribute most to the sum and make sure that 
these realizations are included in it. Second, for dark matter models that predict an abundance of subhalos, randomly 
drawing a realization can be a numerically costly process since thousands or millions of subhalos need to be included in 
the lensing calculation. Most importantly, a purely numerical approach obscures which key physical quantities control 
the impact of substructures on lensing observables. While this approach is viable if we are interested in accurately 
knowing the likelihood for a few points in parameter space, it is impractical if we are using a Markov Chain Monte 
Carlo approach to estimate the final posterior distribution of q and qgub- To remedy these problems, we describe in 
the following section an approach that allows efficient computation of stochastic lensing observables while leaving the 
physics of substructure lensing transparent. 


C. Stochastic substructure lensing: Characteristic function approach 

We now turn our attention to an analytical approach to the computation of lensing observables in the presence of 
a population of unresolved mass substructures. The calculation presented here draws from the theory of stochastic 
microlensing in the presence of a uniform star field [SOIHOOj . As a starting point, our technique takes full advantage 
of two simplifying facts about the impact of mass substructures on the lensing observables: 

• For deflection, shear, convergence, and projected gravitational potential, the overall impact of the subhalo 
population is the sum of the contributions from each mass substructure. 

• Each subhalo is independent of all other subhalos in the lens. 

The first assumption is always true and is a direct consequence of the linearity of Poisson’s equation. The second 
assumption is not strictly true since mass substructures may be themselves clustered within galactic halos. However, 
the relative importance of substructure clustering will be diminished by projection effects since lensing in only sensitive 
to the mass distribution integrated along the line of sight. Moreover, tidal interactions between subhalos and the 
tidal field of the host galaxy will tend to erase correlations among substructures within a few dynamical times 
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m- Thus, to a good approximations, we can use the above simplifying facts to make progress in evaluating 
P(t|q, qsub)- For the moment, let us focus on the lensing quantities that receive purely additive corrections from 
the substructures. These include the projected gravitational potential the deflections <3 = V0, the convergence 
K = {4'xx + 4>yy)/‘2, as well as the shears 7 c = {4>xx — 4>yy)/‘^ and 7 s = 4>xy We denote these “linear” lensing quantities^ 
by tL = { ..., where the index j denotes that these quantities are evaluated at the 

position of the jth image. Our goal is to take advantage of the linearity to first compute P(tL|q, qsub) and then 
reconstruct P(t|q, qsub) using the relation 

P(t|q, qsub)= J dtLP(tL|q, qsub)fc(t - t(tL)). (5) 

The linear quantities tn receives contribution from both the smooth mass model and its environment (described by 
q) and the mass substructures themselves 

N 

tL(q, Csub) = tL(q) + E (6) 

where tL(q) describes the contribution from the smooth model and its environment, while = (5tL(q,Ci) is the 
contribution from mass substructure i. Here, N is the total number of subhalos within the lensing galaxy. Since 
tL(q) is completely fixed by a given choice of q, the inherent stochasticity of tL is entirely caused by the random 
mass substructures inside the lens galaxy. Defining Atn = relevant statistical information about the 

mass substructures is contained in the probability density function <i)Af(AtL|q, qsub) for the collective effect Atn of 
N substructures on the linear lensing quantities. Once is known, the probability density P(tL|q, qsub) appearing 
in Eq. © is simply given by 

P(tL|q,qsub) = ^Ar(tL - tL(q)|q,qsub)- (7) 

Effectively, $ 7 v(^tL|qj qsub) is an Z-dimensional joint probability distribution for I sums of N independent random 
variables, where I refers to the number of linear lensing quantities included in the analysis. Take $i(i5t[(^|q, qsub) to 
be the joint probability distribution for the linear lensing quantities in the presence of a single substructure. For now, 
we assume that we know the functional form of <i>i((5t[)^|q, qsub); its formal derivation is given in the next subsection. 
Since the subhalos are assumed to be independent of each other, $ 7 v(AtL|q, qsub) is given by the A-fold convolution 

of <I)i((5tL^|q, qsub) |101) . We then take advantage of the convolution theorem to write the characteristic functioij^ of 

'—' 

$iv(AtL|q,qsub) in terms of that of qsub), 

QAT (kb I q, qsub) = gi(kL|q, qsub)^, (8) 

where kn is the Fourier conjugate variable to Atn, Qjv(kL|q, qsub) is the characteristic function of $Ar(AtL|q, qsub), 
and gi(kL|q, qsub) is the characteristic function of <l>i(<5t[(^|q, qsub)- 

Now, in a typical galactic dark matter halo the number of mass substructure N is large but unknown. Given a total 
convergence in dark matter substructures and a subhalo mass function, we can compute the average expected total 
number of mass substructures (N) [see, e.g. , Eq. below]. Since the evolution of mass substructures within lens 
galaxies is determined by the complex interplay of accretion, dynamical friction, tidal striping, baryonic feedback, 
and mergers, the actual number of subhalos will typically differ from this average value. Detailed A-body simulations 
[122] of massive galaxies show that the scatter about the mean is consistent with that of a Poisson distribution. Then, 
the resulting characteristic function for the whole substructure population is a sum over all possible values of N, 
weighted by their Poisson probability with mean (A), 

°° (l\f\N 

Q<iv>(kL|q,qsub) = -^gi(kL|q,qsub)^ 

AT^O 

= exp [(A)(gi(kL|q, qsub) - !)]• (9) 

This result states that if one could compute (3'i(kL|q, qsub) for a single mass substructure, then one could ob¬ 
tain the characteristic function for the whole population of unresolved subhalos by taking the exponential of 
(A)(gi(kL|q,qsub) - !)■ Finally, $(Ar) (AtLjq, qsub) can be obtained by Fourier transforming Q(Ar) (kLjq, qsub)- 
Therefore, we have reduced the computation of P(tL|q, qsub) for (A) subhalos to that of computing 9i(kL|q, qsub) 
for a single substructure which is a considerable simplification. 


^ Note that we use the nomenclature “quantities” and not “observables” since 0 , q, k ,7c and 7s are not directly observable. 
^ In this work, the characteristic function is simply the Fourier transform of the probability density function. 
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D. Characteristic function for a single snbstrncture 


To complete our formalism, we need an expression for gi(kL|q, Qsub), the characteristic function of the linear lensing 
quantities in the presence of a single mass substructure. We begin by writing down an expression for <i)i((5tL|q, qsub), 


$ 


.(5tL|q,qsub) = f Psuh{ciH\qsuh)S‘^ - ^tL(q,c^„[)) 


( 10 ) 


where are the parameters describing the properties of a single mass substructure. Here, T’sub(Cs,'jjj|qsub) is the 
probability density function describing the probability of finding a clump of dark matter with parameters given a 
set of substructure population parameters qsub- The characteristic function of the above distribution, gi(kL|q, qsub), 
is simply the Fourier transform of Eq. (10), 


gi(kL|q,qsub) = J e*'^*^^‘’’‘'="‘’^ ‘'^-Psub(c^i|,|qsub)- 


( 11 ) 


Computing this integral requires us to specify the spatial geometry over which the mass substructure is distributed 
as well as the subhalo mass function inside the lens galaxy. In the next section we describe our strategy to evaluate 
this characteristic function. 


III. CHARACTERISTIC FUNCTION FOR SUBSTRUCTURE POPULATION 

Up to this point, we emphasize that our analysis has been very general and is purely based on the linearity 
and independence of mass substructures inside galactic halos. In this section, we consider how the geometry of the 
substructure distribution inside galactic halos can be used to simplify the calculation of gi(kL|q, qsub)- As we describe 
below, it is advantageous to divide the substructure population into a distributed subpopulation that contains the 
vast majority of subhalos and contributes small perturbations to lensing observables, and into a local subpopulation 
that contains a few strong perturbers to lensing observables. 


A. Local versus distributed substructure populations 

We wish to compute the characteristic function for the linear lensing quantities in the presence of a single substruc¬ 
ture at typical lensed image locations {x^} situated close to the Einstein radius Rdn of the lens. Similar to the analysis 
of Ref. [38j . our strategy is to divide the image plane into two regions: (i) an inner disk of radius R^in containing all 
the lensed images and a relatively small number of substructures (denoted “local” substructures), and (ii) an annulus 
with inner radius i?min and outer radius i?max containing the vast majority of the substructure population, which we 
shall refer to as the “distributed” population. This choice is illustrated in Fig. The first thing that is evident from 
Fig. a is that the strong lensing region (red innermost circle) of typical galaxy-scale lenses probes the very inner part 
of the galactic halo. This is the region where flux ratio anomalies have been used to probe mass substructures within 
lens galaxies Eoi [ssiis Hi m EiiS] - This is also the region where direct gravitational imaging [1I1I2H1I1S1113IH7] 
and spatially resolved spectroscopy [2ll[Z2l|88] can be used to detect individual mass substructures within galaxy-scale 
lenses. In this area of the lens plane, it is possible for a mass substructure to cause a large perturbation to lensing 
observables which are known to cause non-Gaussian “heavy tails” [35] in stochastic lensing probability density func¬ 
tions. Furthermore, subhalos can have significant overlap with lensed images, implying that the internal properties of 
mass substructures such as their concentrations and tidal radii can be probed in this regime [20j . Due to its small size 
compared to the overall spatial extent of the dark matter halo, the inner region contains a relatively small fraction of 
the total number of mass substructures in the gravitational lens. 

On the other hand, the outer region of the lens halo (the area outside the dashed blue circle) contains the vast 
majority of the lens galaxy mass substructures. Since they are quite distant from any lensed image, these subhalos 
cannot significantly affects the lensing observables on an individual basis. However, their collective effect is not 
necessarily negligible. Furthermore, because of their relative position with respect to the strong lensing region, we do 
not expect their internal structure to play a significant role in their lensing signatures. Importantly, these properties of 
the distributed mass substructures considerably simplify the calculation of the characteristic function gi(kL|q, qsub)- 

It is instructive to compare the relative contribution of the distributed and local subhalo populations to the linear 
lensing quantities. Writing the total contribution from substructures as 


AIl = -f AtS('"‘, 


( 12 ) 
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FIG. 1. Illustration of the various scales involved in galaxy-scale substructure leasing. The typical Einsteins radius Rdn of 
lens galaxy (~ 1 arcsec) is indicated in red, while the typical virial radius of the galactic halo is indicated in green. The 
dashed blue circle and the outer black circle indicate our choice of i?min(= 3i?oin) and i7max(= 657?oin) for the computation of 
the substructure characteristic function, respectively. The scattered dots represent a realization of a substructure population 
with the spatial distribution gi ven in Eq. (191 with a core radius given by Vc = 307Zein. Here, we have assumed a power law 
mass function as given in Eq. (20l with /3 = —1.9, Miow = W’^Mq, and Mhigh = The average convergence in mass 

substructure is taken to be (Ksub(i7ein)) = 6 x 10~'*. See main text for a description of the notation. The inset at the bottom 
right shows an enlargement of the halo’s central region. In general, only an order unity number of substructures are projected 
close to the Einstein radius of the lens. 


let us compare the local and distributed pieces for potential fluctuations, deflections, convergence, and shears. To do 
so, we generate 10^ Monte Carlo realizations of mass substructure population. We assume the substructures to have 
smoothly truncated Navaro-Frenk-White (NFW) three-dimensional density profiles given by |123j 


p(rsub) 


^^NFW / \ 

47rr(rsub + rs)^ W^b + ’’t / 


(13) 


where rgub is the three-dimensional distance from the center of the subhalo, rg is the scale radius, and rt is the tidal 
radius. We note that our choice of NFW profile is more conservative than the often used Pseudo-Jaffe profile since 
the latter has a steeper inner density slope and has thus a larger lensing efficiency. It is important to keep in mind 
that observations of low-mass galaxies show mild preference for even shallower density profiles, implying that the 
magnitude of the local substructure perturbations discussed in this section are likely to be conservative upper bounds. 
For the truncated NFW profile, the mass scale Mnfw is related to the total mass Msuh of a substructure via the 
relation |123j 


Msnh = ^nfw ^^2 ly - 1) In r -b TFT - (t^ -b 1)] 


(14) 


where r = rt/rg. We take the relation between the substructure mass and its scale radius to be (see Appendix [A|) 


1 kpc 


= (1.0 ±0.3) 


Mgub 

IO^Mq 


0.735 


(15) 



































where we have taken into account the scatter in this relation as inferred by A^-body simulations |124) . We also take 
the tidal truncation radius to obey the standard relation |12511126) 

Msuh 

[2 - dlnMniain/dlnr3D]Mi„ain(< J'sd) 

where tsd is the three-dimensional distance between the mass substructure and the center of the main lens halo and 
-^main(< ?' 3 d) Is the fraction of the mass of the main halo contained in a sphere of radius tsd- For a spherical main 
lens with a power-law convergence profile 

\ ^ ^main 

^main(^) ” 2 ( 7 *) ’ (^inain 7^ 2 ) ( 1 "^) 



\ 1/3 

) I'SD, (16) 


where 6 is a length scale closely related to the Einstein radius of the main lens, r is the projected two-dimensional 
distance from the center of the lens, and amain is the power-law index of the density profile, the tidal truncation radius 
takes the form (see Appendix [A| for more details) 


n 


/o- ^ \-l/ 0 . ^main 

amain M,nb \ ( 

2 -amainr( 5 ^“) 2 VFSerit&y J 


(amain 7 ^ 2 ), 


(18) 


where r(a;) is the gamma function and where Ecrit is the critical density for lensing. The substructures are taken to 
be spatially distributed in two-dimensional projections according to a “cored” profile for 0 < r < -Rmax given by 


Vr{r,e) = 


( 1 


\2'Krl WiRi^^-^jrc) - IJ (1 -b (r/rc)Y 


where W{x) = 


1 


In (1 


(19) 


and where is the core radius of the substructure distribution. This spatial distribution profile constitutes a good 
approximation to the radial substructure distribution found in A^-body simulations [106] . The core radius Tc is found 
to be a large fraction of the main halo virial radius. Here, we take Tc = 30i?cin, where i?ein is the Einstein radius 
of the smooth lens. For a typical galaxy-scale gravitational lens with i?oin ~ 1", this gives Tc 189 kpc for a lens 
at redshift ^jens ~ 0.5. We define the boundary between the local and distribu ted popu lation to lie at i?min = 3i?oin 
and also choose i?max = 65i?oin- We note that rsD is relat ed to r via ran = w here h is the projection of 

rsD along the line of sight, which must lie in the range — l?max ~ ^h< ^ i?max ~ lor a spherical halo. When 

we generate the Monte Carlo realization, we first choose r from the probability distribution in Eq. (19), and then 


randomly pick h from the above range in order to generate the value of ra^. We note though that in a realistic halo, 
the values of h will not in general be uniformly distributed within the above range. However, since h only enters in 
the calculation of the truncation radius, the impact of this approximation on our results is very small. 

We take the substructure to be distributed in mass according to a power-law mass probability distribution 


VixfiMsuh) — pr — ^^i3+i 71J/3+1 ’ l^iow < Afsub < Afhigh, {P Y 1)7 (20) 

where /3 is the power law index, and where Mhigh and Miow are the highest and lowest subhalo masses inside the lens 
galaxy, respectively. As was found numerically in Ref. |l()6j . we take /3 = —1.9. We also choose Mhigh = IO^^Mq 
and Miow = lO^M©. While Miow is typically much lower in standard cold dark matter models |127L 1128) . this latter 
choice ensures that the number of mass substructures inside the lens galaxy is manageable within our Monte Carlo 
realizations. The actual number of mass substructures indeed the lens galaxy is taken to be Poisson distributed 
around a mean value given by 


(A) 


(b^sub(Rein)) 

/ dMsuh f rdrdd VM(Msub)'Pr(r, 0)KtNFwHr - Rcinl) 


( 21 ) 


where the angular bracket denotes ensemble averaging over many substructure realizations of the lens halo and 
KtNFw(r) is the convergence profile of a single smoothly truncated NFW subhalo as given in Ref. |123j . Equation (21) 
follows from the independence of subhalos within the lens galaxy. We take the average convergence in mass substruc¬ 
tures at the Einstein radius of the main lens to be (/«sub(Rcin)) = 0.001. We note that setting (wsub) as above is 
equivalent to choosing an overall normalization for the subhalo mass function [see Eq. (41) below for more details]. 

We illustrate in Fig. the probability distributions for both the distributed and local contributions to the total 
linear lensing quantities for our 10"^ Monte Carlo realizations. For lensing potential and deflection fluctuations, we 
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FIG. 2. Probability distributions for the local and distributed contributions to the linear lensing quantities (j)suh, Osub, 7 sub, 
and Ksub- These quantities are evaluated at the Einstein radius of the main lens, which we take to be iilein = 1”. We 
assume the lens to be at redshift ^lens = 0.5 with a source at redshift Zsource = 1 


Ecrit = 1.19 X lO^^M 0 /arcsec^. We define the divide between the distributed and local contributions to lie at R, 


yielding a critical density for lensing 

= 3f?cin 

and include mass substructures up to 7?max = 657?ein. The substructures are spatially distributed according to a cored profile 
[Eq. (19l] with core radius Tc = 30iZein. Mass substructures are taken to have a smoothly truncated NEW profile with tidal 
truncation radius that depends on subhalo mass and halo-centric distance as given in Eqs. ( |16[ ) and ( |18[ ). We use a subhalo 
mas s-co ncentration relation derived from N-body simulations |124| and also implement the scatter about this relation [see 
Eq. ( |l5[ )]. We assume a power law subhalo mass function with slope /3 = —1.9 between Miow = lO^M© and Mhigh = 10 ^°Mq. 
We take the average lensing convergence in mass substructure at the Einstein radius to be (Ksub(IZein)) = 0.001. 


observe that the most probable fluctuations are dominated by the distributed substructures. This result can be 
explained by a simple geometrical argument. Indeed, while the contribution to the net deflection from substructures 
inside a thin ring of radius r decays as 1/r for increasing r, the number of mass substructures inside the thin ring 
grows as r for r < r^- Thus, inside the core radius of the substructure distribution, mass substructures located in 
a distant ring can contribute just as much to the total deflection as substructures much closer to the lensed images. 
A similar argument applies to lensing potential fluctuations. This indicates that detailed analyses of time delays 
and astrometric fluctuations caused by mass substructures can yield important information about the distributed 
population of satellite surrounding lens galaxies. 

On the other hand, the substructure contribution to the shear and convergence (which determine the magnification 
perturbations) at a typical image position is largely dominated by the local subhalos. This is due to the fact that 
shear perturbations decay as while convergence fluctuations decay even faster (r~^ for our choice of truncated 
NEW profile). It implies that the contribution from distant rings of substructures is always subdominant compared to 
the local contribution, although the collective shear perturbations from the distributed substructures is not entirely 
negligible. Furthermore, for deflections, shears, and convergence, the largest fluctuations are always dominated by 
the local contributions. These large local perturbations, often caused by a single substructure close to a lensed image, 
have been used to detect individual mass substructures [Ml mi m]. What Fig. i is showing however is that by 
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combining magnification information (largely sensitive to 7sub and Ksub) with astrometric (sensitive to Osub) and time 
delay (sensitive to ^sub) measurements, one could infer important properties about both the local and distributed 
substructure populations inside lens galaxies. This highlights the importance of developing a unified framework to 
jointly handle the different leasing observables, which is a major goal of this work. 

Splitting the mass substructures into two subpopulations allows us to factorize the characteristic function 
given in Eq. ([^ as 

Q{n) (kblq, qsub) = (kL|q, qsub)Q^Wj) (kblq, qsub), (22) 


which follows from Eq. (12) and the independence of each substructure. Here, (N) = {N\) + (iVd), where (Ni) is the 
average number of substructures in the local population, and where (iVd) is the average number of substructures in 
the distributed population. We note that we generically have {N^) >> (M)- In terms of the characteristic function 
for a single subhalo, this implies 


91 (kblq, qsub) = 


(jVi)gi (kblq, qsub) + (fVd)gi (kblq, qsub) 
(N) 


(23) 


We can therefore separately compute the characteristic function for the local and distributed subpopulation and then 
combine them according to Eq. (23) to compute the overall characteristic function of linear leasing quantities. In 
this work, we focus on statistically characterizing the distributed population of mass substructures inside typical lens 
galaxies, which is the dominant contribution for the projected leasing potential and deflections. We leave to future 
work the characterization of the local substructure population, but we note that gravitational imaging techniques 
dU [3ni SSI [57] and resolved spectroscopy [531 EH SH] can provide information about certain regions of the local 
substructure population. 


B. Distributed substructure analysis for potential and deflection perturbations 


In this section, we outline our calculations of the characteristic function 9i'®‘(kL|q, qg^b) for the distributed popu¬ 
lation of mass substructures. We focus exclusively on computing the characteristic function for the projected lensing 
potential and the deflection perturbations since the contribution to shear and convergence perturbations from the 
distributed population of mass substructures is subdominant. As described above, there are key simplifying facts for 
the distributed substructure population: 

• Their overall impact on the lensing observables is small. 

• We can approximate them as a collection of point masses. 


In the point-mass approximation, a single subhalo can be described by three parameters: its total mass Mguh and its 
radial and angular position in the lens plane. In the notation from Sec. this implies = {Msuh,T,S}. In order 
to construct the characteristic function, we need to specify the probability density function T’sub(Afsub) 6*; qsub) for 
these parameters. As in our Monte Carlo examples of Sec. Ill A[ we assume that this density function is separable 
into the product of the subhalo mass function with a spatial density distribution 


^suh (Afgub; r, qsub) (Algub 7 qsub)7^r (r, qsub ) • 


(24) 


For cold dark matter, this separability is supported by A^-body simulations over a wide range of subhalo masses 
[1061112911130] . It remains to be seen whether this separability holds for more general dark matter models or when 
baryonic feedback is taken into account. In the cases for which the mass and spatial distributions are not separable, 
one could split the subhalo population into several subpopulations that each have with their own spatial distribution. 
For simplicity, we assume here that Eq. (24) is valid, but it is clear that our analysis could also be carried out without 
this assumption. As we see below, we do not need to specify an explicit form for the subhalo mass function and 
position distribution at this point since all important quantities can be expressed as statistical moments of these 
distributions. 

Before going through the detailed derivation of g5*‘®*(kL|q, qsub), it is informative to heuristically derive what we 
expect the answer to be. As we discussed in the previous subsection, potential and deflection perturbations obtain 
contributions from a broad spatial projected area surrounding the strong lensing region. The resulting large number 
of mass substructures contributing to the total potential and deflection perturbations implies that the central limit 
theorem is applicable, and we thus expect Gaussian statistics to be approximately valid for these linear lensing 
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quantities. In this approximation, the statistical properties of the linear lensing quantities are entirely specified by a 
covariance matrix Csub with general scaling given by 


''sub 


{N<i) 


dmVm{rn-qsuh)m^ 


d^rrr{r,e-(i,^^)OlOi 


(25) 


where the leading factor arises since the variance of the sum of (fVd) normal random variables is (N^) times the 
variance of a single normal random variable. The second factor arises because the linear lensing quantities are always 
proportional to the subhalo mass, and the third factor is the spatial two-point function of the linear lensing quantities. 
Here, ©l stands for the spatial dependence of the ith linear lensing quantity. As we see below, this scaling comes out 
naturally of our analysis. 

We now turn our attention to the detailed derivation of the above scaling as well as the leading order deviations 
from the Gaussian approximation. The lensing potential difference 4’suhi'^i) between an image position and a 
reference point Xref caused by a point mass M at position x is given by 


(f’subi'^i) = min 


|Xref - X| 


(26) 


where m = M^uh !Since Scrit is the critical mass density for lensing, m has units of area. Since jx^j ^ i?ein ‘C 
|x| for a typical distributed dark matter substructure, we can write down the lensing potential difference at an image 
location as a multipole expansion. Converting to polar coordinates with x = (rcos0,rsin0), we obtain 


j.p 

p^l 

where the dimensionless series coefficients are 

Ap(xj) = i (rf cos {p9{)- rfgf cos (pdref)) 


(/'sub(xi) = -m'Y] — [Ap{xi) cos {p0) + Bp{xi) sin {p9)], 


Bp{xi) = - (rf sin {p9i) - sin (p9ref)), 


(27) 


(28) 


where we have used x^ = (ri cos 0i,ri sin 9i) and Xref = (r^f cos drefj ^’ref sin6*ref). Since the deflections are simply 


related to the lensing potential by derivatives, that is, asub(xi) = Vx^i^sub, we can write expansions similar to Eq. (27) 


for each of these quantities. The only difference is that the series coefficients for Asub are derivatives of Ap(xi) and 
Bp(xi). Taking Ol = Atu/m to denote the vector containing all the stochastic random variables describing the 
perturbations to the linear lensing quantities, we can thus write 


1 

p^l 


Ap cos {p 9) + Bp sin {p 0) 


(29) 


We note that we have divided out the leading factor of the subhalo mass in the above definition since it only leads 
to an overall rescaling of On- In general, On would contain the stochastic variables and for the lensing 
potentials and deflections, respectively, evaluated at all possible image positions i € iVimg. For instance, in the case 

lb) _ 1 / Ji) Q,b) Q,b) 1 

Aub’“sub,a:> Wub.y f 


of a single image with label i, we have 0^"' = | 

rf cos {p9i} - f cos {p9re[} p-i 


Ap — 


,rl * cos {(p - 1)0^},-rf sin{(p-l)0i} 


(30) 


Bp = 


rf sin {p0i} - sin {p^^ef} „-i r, ^ . n-i , 


P 


,rf sin{(p-l)0J,rf cos {(p-1)0,} 


(31) 


where p > 1. We emphasize that Ap and Bp are constant vectors that only depend on the configuration of lensed 
images and are thus independent of the mass substructure population. Taking ku to be the Fourier conjugate of the 
stochastic vector On, the characteristic function for a single dark matter substructure can be written as 


gf’’*(kL|q, qsub) = / d'^r / dm e*”"‘"^'°^Psub(w, r, 0; q^ub) 

J-Hd -) 

= l-f / dV /dm(e*™’^^ °^ - l)Psub(?7i,r,0;qsub) 

'Hd 


( 32 ) 
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where Hd denotes the area of the distributed domain of the lens halo and where we pulled out the leading factor 
of unity since we are only interested in the difference (7i(kL|q, Qsub) ~ 1- Evaluating the above integrals is the most 
difficult part of the calculation. Clearly, for |kL| <C l/(m|C>L|), the phase factor is nearly equal to unity and qi 1, 
while qi ^ 0 for |kL| ^ l/(m|C>L|) since the phase is rapidly oscillating in this regime. We expand (;i(kL|q, qsub) hr 
a power series of mass and spatial moments 

“Tl / ?T-\ P 

<zf^^(kL|q,qsub) = l +/ dV7^.(r,0;q™b)(kL-dLr, (33) 

where the mass moments are given by 


( to ”) = J dmVm{m]qsuh)m'^- 


(34) 


For conciseness, the simplification of the spatial integral appearing in Eq. (331 is presented in Appendix [ b| After these 
simplifications, the characteristic function for the linear quantities in the presence of a single mass substructure 
then takes the form 


^dist 


*"‘(kL|q,qsub) = 1 + 

n—1 


(- 1 )”( to ” 


A^mult -/Vmult 

t^l s^l 


(35) 


= Pj’ where the kernel Kp is given by Eq. (B21. 


where p = {pi,P 2 , ■ ■ ■ is a multi-index with ||p 

It is understood that if a given At or Bt vanishes, then the corresponding pt must also vanish. We emphasize that 
the kernel Kp encodes all the information about the spatial distribution of mass substructures within the lens halo. 
This kernel can be computed for any halo geometry and mass substructure distribution. Finally, we can use Eq. ([^ 
to compute the characteristic function in the presence of a whole population of mass substructures 


‘5<iVd)(l^L|q>qBub) = exp 




(- 1 )”( to ”) 


n—1 


^mult 








s=l 


At leading order, this characteristic function has a Gaussian behavior, 


C(w^)(kL) OC n 


ZU'Icl 9 kr Csub^L 


(36) 


(37) 


where u = (AIl) is the mean vector and Cgub = (AtLAtL) is the covariance matrix. We note that in the case of 
circular symmetry of the galactic halo, the mean vector u exactly vanishes. We give in Appendix some useful 
expressions for the covariance matrix in the case of circular symmetry for two different spatial distributions. 

The non-Gaussian terms in Eq. (361 essentially forms a multivariate Edgeworth expansion (see, e.g. , Refs. unmsi]) 
with successive term decaying as We show the details of this expansion in AppendixO but it is instructive 

to consider the magnitude of the non-Gaussian contributions to (^l) in order to assess the validity of the leading 

Gaussian approximation. At each order n in the 1/(A(j)”/^“^ expansion, the leading order non-Gaussian contribution 
takes the general form 


{Ol 


n!(Ad)”/2-i (to2)"/2 (C>2)V2’ 


(n > 3). 


(38) 


Here, we use the compact notation (Oj)) to represent all possible spatial n-point functions of the different linear lensing 
quantities. In order to evaluate the above expression, we need to specify the mass function and spatial distribution of 
mass substructures. For illustration, we take the spatial distribution given in Eq. (19), and write the mass function 
as 


dN 

kmT 


sub 


I Mo j ^ 


(39) 


where oq is the mass function normalization and Mq is a reference mass scale. Using Eq. (21), the expected number 
of mass substructures in the distributed region is then 


(A^d) = 


]\/fd+'^ 
OO -'^^high 


- M; 


/3-bl 


low 


MP 


i+1 


(1 - Vr{< i?mi„)) , {P ^ -1) 


(40) 
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FIG. 3. Non-Gaussian contributions to the Edgeworth expansion of the characteristic function for different values 

of the mass function parameters. These curves characterize the degree of non-Gaussianity of the probability distribution of the 
linear leasing quantities. A v alue of unity on these plots indicate an 0(1) deviation from Gaussianity. We assume a power law 
mass function as given in Eq. (391 with Mhigh = 10^^Mq and also take the distributed mass substructures to be spatially located 
between i?min = 3i?ein and i?max = 65i?ein according to Eq. (|19[l with rc = 30iZein. We illustrate the leading contribution at 


each order n for n = 3 to n = 6. Here, (A^d) is computed as in Eq. (401. The spatial moments (Ol) are computed assuming 
that Ol is a deflection at a single image position, but similar results would be obtained for the lensing potential. Each panel 
illustrates different mass function parameters as indicated. The top panels fix /3 = —1.9 and display the dependence of the 
non-Gaussian corrections on the ratio Miow/A/high for two different values of uq. In the bottom panels, we fix oq and display 
the dependence on the mass function slope for two values of Miow/A/high . 


where Vr{< i?min) is the cumulative probability of finding a mass substructure within a disk of radius We take 

Mq = Mhigh throughout this work. We note that in the point-mass limit, the convergence in mass substructures is 
related to the mass function given in Eq. (39) via 


(^sub(^ref)) — 


Go 


1 


f/5+2 

low 


Mn ^crit 13 + 2 


{P ^ - 2 ), 


(41) 


where Tref is a reference radius (e.g. i?oin) where the convergence is evaluated. 

We illustrate in Fig.j^the non-Gaussian contributions given in Eq. (381 evaluated from n = 3 to n = 6 for different 
mass function parameters. Here, we take Ol to represent a lensing deflection, but similar results would be obtained 
for the lensing potential. The upper panels illustrate the dependence of the non-Gaussian contributions on the lowest 
subhalo mass for two different values of the mass function normalization og with j3 = —1.9. For comparison, pure 
cold dark matter simulations yield gq ~ 3.8 x 10 “^°Mq^ at the pivot point Mg = IO^^Mq with j3 = —1.9 |106) . 
We observe that for Miow/Mhigh ^0.1 the non-Gaussian contributions are subdominant for the fiducial values of 
Gg = 3.5 X 10“^°Mq^ and (3 = —1.9. Interestingly, the largest non-Gaussian contribution comes from the n = 4 term, 
which implies that the probability density function of linear lensing quantities will primary pick up a nonzero excess 
kurtosis in this case. Further increasing the normalization of the subhalo mass function suppresses non-Gaussianities 
even more (upper right panel) since (Ag) oc Gg. However, as Miow/Mhigh —5^ 1, the non-Gaussian contributions rapidly 
rise since the mass substructure population becomes dominated by a very limited number of massive subhalos and 
the applicability of the central limit theorem wanes. 
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The lower panels of Fig. [^display the dependence of the non-Gaussian corrections on the slope of the substructure 
mass function. Here, we fix the ratio MiowZ-^high and the amplitude of the mass function at Mq = Mhigh- We 
observe that as j3 is made steeper (more negative) the non-Gaussian corrections rapidly decay since the number of 
mass substructures quickly rises with a steepening slope. Decreasing the ratio Miow/Mhigh has little effect for /3 > —2 
but does lead a faster decay of non-Gaussianities for j3 < —2. Again, we observe that the n = 4 term dominates the 
non-Gaussianities when the mass function slope j3 > —2.3 for the realistic normalization of the mass function shown. 
We confirm this observation by comparing our analytical results to Monte Carlo realizations in the next subsection. 

We note that we can also suppress non-Gaussianities by increasing Rmin- Indeed, the non-Gaussian spatial moments 
rapidly decreases as i?min is increased as shown in Fig.For definiteness, we illustrate there the ratio 
of non-Gaussian spatial moments for a single component of a lensing deflection. The results would be very similar 
for other linear lensing quantities. From a practical point-of-view, we would like i?niin to be as small as possible in 
order to encompass as many mass substructures as possible in the distributed analysis. On the other hand, we also 
need to choose a value of i?min large enough for the expansion of Eq. (36) to rapidly converge. Our tests show that a 
minimal radius in the range 3i?oin ^ I?min ^ 5i?ein generally provides a good compromise between these two criteria 
for the power law mass function considered in this work. Of course, for a different choice of mass function one should 
adjust i?min in order to insure the convergence of the Edgeworth expansion. 

The picture that emerges from the considerations above is that for the CDM-relevant case of (3 ^ —1.9, 
Miow/Mhigh ^ 1, and a realistic normalization of the substructure mass function supported by simulations, the 
non-Gaussian contributions to Eq. (36) are subdominant and the joint probability density function of linear lensing 
quantities will thus be well approximated by a multivariate Gaussian. In this physically relevant region, a useful 
criterion for the validity of the Gaussian approximation is 


Oo 


> 


10 (/3-b3)2 (O; 


Mhigh 4!(/3 + 5) {Ol 


(42) 


which is valid for Miow/Mhigh <C 1, /I > —3, and where Oq is the amplitude of the mass function at Mq = Mhigh- For 
our choice of spatial distribution given in Eq. (19) with i?min = 3i?oin, Rmax = 65i?ein, = 30i?oin, and assuming 
Mhigh = 10^°Mq, (3 = —1.9, and i?oin = 1", this criterion reads oq > 2.3 x 10 “^°Mq^ when Ol is a lensing deflection. 
This condition would be slightly relaxed if Ol is taken to be the lensing poten tial instead. Whenever this condition 
is satisfied the characteristic function expansion given in Eq. (36) [see also Eq. (G6|] provides an accurate description 
of the statistical properties of perturbations to the linear lensing quantities caused by distributed mass substructures. 



■^minZ-^ein 


FIG. 4. Dependence of the non-Gaussian spatial moments of a deflection y-component on the value of Amin- We take the 
distributed mass substructures to be spatially distributed between Amin and Amax = 65Aein according to Eq. ( |19| l with = 
SOAoin- It is understood here that the mass dependence of the lensing deflection has been divided out, that is, dsub.j/ = Osuh,y/m. 
We see that the non-Gaussian moments decay as Amin is increased. 
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C. Validity of analytical approach 


We now test the validity of the characteristic function-based approach by comparing its prediction to the distribu¬ 
tions of linear leasing quantities obtained by considering a large number of Monte Carlo realizations of distributed 
substructure populations. As in the analytical calculation of the previous section, we treat the distributed mass 
substructures as point masses that are spatially distributed according to the cored proHle given in Eq. (19) with 
Tc = 30i?oin, between Rmin = 3i?ein and i?max = 65i?oin- We consider the distribution of linear leasing quantities 
at two fiducial image positions located at Xi = (0, i?ein) and X 2 = (i?oin,0), and take Rom = 1". For concreteness, 
we assume a lens at redshift zjens = 0-5 with a source at redshift Zgrc = 1 ) which yields a critical density for leasing 
Scrit = 1-19 X lO^^M 0 /arcsec^. To compute the hnal probability distribution of linear leasing quantities, we sample 
the characteristic function given in Eq. (C 6 ) on a grid of ku and use a fast Fourier transform algorithm to perform 


the transformation back to configuration space. 

In Fig. we compare our analytical predictions to the results f rom Monte Carlo simulations of distributed sub¬ 
structure populations for a subhalo mass function as given in Eq. (20) with Miow = IO^Mq, Mhigh = lO^^M©, and 
P = —1.9, with a normalization given by (Ksub(Aein)) = 0.001. We display different projections of the joint probabil¬ 
ity density function for the linear leasing quantities evaluated at the two hducial image positions. The gray points 
in the 2D plots and the blue histograms along the diagonal show the results from the Monte Carlo realizations of 
distributed s ubst ructure population. We show in solid black the results gotten by only keeping the leading Gaussian 
term in Eq. (C 6 ), while the dashed red lines show the results obtained by keeping all terms up to order 0((Ad)“^) 
in the Edgeworth expansion. Since the mass function parameters listed above predict a relatively large number of 
mass substructures within the lens halo ((A^d) = 3705), the non-Gaussian contributions in Eq. (C 6 | are suppressed 


and the overall behavior of the joint probability density function is very well captured by its leading Gaussian term. 
Nevertheless, we see that including the higher-order terms in the Edgeworth expansion does improve the concordance 
of the analytical predictions with the Monte Garlo realizations. This is especially noticeable in the one-parameter 
probability densities shown along the diagonal where we observe that dashed red lines capture the nonzero excess kur- 
tosis of the Monte Garlo realizations. This indicates that the characteristic function expansion performed in Sec. |IIIB| 
does lead to the correct probability density function for the linear leasing quantities. 

In Fig. we display similar projections of the probability density function of linear leasing quantities, but here we 
choose a high value of the low mass cutoff Miow = 2 x 10 ®Mq, together with Mhigh = IO^^Mq and (Ksub(-Roin)) = 
3 X 10“^. This is an example of a model with very few distributed substructures ((A^d) = 24) for which the leading 
Gaussian approximation still works reasonably well. As clearly shown in the ID histograms along the diagonal of the 
plot, this model does have a significant excess kurtosis which is well captured by the Edgeworth expansion. Again, this 
highlights the usefulness of the expansion given in Eq. (|G 6 |) to understand the leading departure from Gaussianity. 


Interestingly, we observe in Fig. that the perturbations to linear leasing quantities from the distributed mass 
substructures sometimes display strong correlations among themselves. This indicates that the correlation length of 
perturbations to the linear lensing quantities caused by distributed substructures is larger than the typical image 
separation in lens systems, consistent with the findings of Ref. [50]. More precisely, this implies that the linear lensing 
quantity perturbations from distributed substructures are dominated by the dipole (p = 1 ) term in the multipole 
expansion of Eq. (29). For deflections, this term is independent of image position which explains the very tight 


correlation between ^ and and between y and y The scatter about this almost perfect correlation 

is due to the contributions from higher multipoles. We note that this scatter tends to increase at large deflection values 
since these are caused by substructures that are closer to the images and are thus described by higher multipoles. The 
correlation between deflections and lensing potential perturbations also suggests that time-delay fluctuations caused 
by mass substructures are usually accompanied by a corresponding shift to the image position. 

Ultimately, the correlations between linear lensing quantities at different image positions are particularly interesting 
since it may allow one to distinguish the effects of local substructures which mostly affect a single lensed image from 
those of distributed substructures which affect multiple lensed images coherently. However, we generally expect these 
coherent perturbations to be somewhat degenerate with the smooth lens model. For instance, the dipolar (p = 1) 
component of the perturbation can generally be compensated by an appropriate shift to the source position, while 
the perturbation quadrupole (p = 2) could be reabsorbed by an appropriate external shear. We leave the study of 
these potential degeneracies to future work. 


IV. ANALYTICAL MARGINALIZATION OVER DISTRIBUTED MASS SUBSTRUCTURES 


In this section, we first describe how we transform from the linear lensing quantities to the actual gravitational 
lensing observables that can be compared with data. We then explain how we perform the analytical marginalization 
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FIG. 5. Projections of the probability density function for the linear lensing quantities {<^sub, Q^sub.yi 0!sub,xi ®sub,y} 

the presence of a distributed population of mass substructures. Here, the two images are taken to be xi = (0, /Join) and 
X 2 = (i?oin,0), where we take /Join = 1”. In the above, <j)aub stands for the projected potential difference between the two 
images. The gray points in the 2D plots and the blue histograms along the diagonal show the results from 10"^ Monte Carlo 
realizations of distributed point mass-like substructure population. The solid black lines display the analytical results from 
Sec. |III B| assuming a purely Gaussian characteristic function, while the dashed red lines show the results obtained by keeping 
all terms up to order in the Edgeworth expansion [see Eq. (C6|]. In the 2D plots, the inner and outer contours display 

the 68 % and 95% confidence regions, respectively. We assume the mass substructures to be spatially distributed according to 

We also take a power law subhalo mass function with slope /3 = —1.9 between Miow = IO^Mq 
)) = 0.001. This yields an expected number of distributed mass substructures 


„(i) 


„( 2 ) 


,,( 2 ) 


Eq. (191 with Tc = 30/?, 
and A/high = 10^°A/©, and take («;sub(/?. 


(Aid) = 3705. 
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FIG. 6. Projections of the probability density function for the linear leasing quantities similar to Fig. but taking Miow = 
2 X 10®Mq, Mhigh = 1O^®M0, and {K,suh{Rein)) = 3 X 10“"^. The expected number of distributed substructures is (A^d) = 24. The 
gray points in the 2D plots and the blue histograms along the diagonal show the results from 5 x 10^ Monte Carlo realizations 
of distributed point mass-like substructure population. 


over the distributed mass substructure population using the characteristic function (kL|q,Qsub) computed in 

the previous section. Here, we assume that the effect of the local substructure population is perfectly known, which 
is equivalent to setting (5*‘5^|^^*(kL|q, Qsub) = 1- We also derive a general expression for the data likelihood in the 
presence of a distributed population of unresolved mass substructures. We first describe the general calculation, and 
then specialize to the case where (kplq, qsub) is a multivariate Gaussian. 


A. General case 


The first step in to compute how the probability density function of the stochastic variables t is related to that 
of the stochastic variables Atp for which we have computed the characteristic function in the previous section. For 
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definitiveness, we take AIl to contain all the stochastic deflections i G Aimg, caused by distributed substructures 
as well as all the stochastic lensing potential shifts between the ith image and the reference point. In the following, 
we take the reference point to be the position of the image that is leading the arrival time. We also take the stochastic 
vector of observables t to contain the image positions and the time delay between image i and the reference 
image caused by distributed substructures. 

The probability density function for the perturbations AIl'®* to the linear lensing quantities caused by distributed 
substructures is simply 


^(jVd>(^tL|q,qsub) = y ^^Q^w‘)(kL|q,qsub)e (43) 

where it is understood the AtL stands for At^'here. We then apply the transformation given by Eqs. ([^ and Q 
to compute the density function for the t stochastic lensing observables 

-P(t|q,qsub) = y dtL y (t - t(tL)), (44) 

where k is the length of the t vector and is the contribution to the linear lensing quantities from the smooth lens 
model [Eq. (|^]. Note that we generally have I ^ k since, for instance, the two shear random variables are mapped to 
a single magnification perturbation. Nevertheless, as we are only considering potential and deflection perturbations, 
we have I = k here. In practice, the relation t(tL) is nonlinear but can nonetheless be written down and inverted in a 
straightforward manner. However, since the distributed substructure population leads to small changes to the lensed 
image positions and their relative time delays, we can linearize this relation as 

t(tL) « t + A^^(tL - II), (45) 


where t are the lensing observables in the absence of substructures, and where A is a Z by Z matrix encoding the 
transformation between the linear lensing quantities and the actual observables. For instance, in the case where 

t = {x(®\x('’\x(^'\ At(^), and AIl = {ttgubi ®sub>‘^subl^ assuming that image i is the leading 

image, the inverse of the transformation matrix A is given by 


A = 


0 

0 

0 

V 0 


Ms(Xj) 

0 

0 

0 


0 

'l-l 


0 

0 

Ais(xfc)“^ 

0 

0 


0 0 \ 

0 0 

0 0 

-to^ 0 

0 / 


(46) 


where stands for the 2x2 magnification tensor of the smooth lens component evaluated at the unperturbed 

image position x^, and to is the time constant of the lens which is given by 


^0 


1 T -2^1ens kJlDs 
c Dis 


(47) 


where ^lens is the redshift of the lens, c is the speed of light, and Di, Ds, and Dig are the angular diameter distances 
to the lens, to the source, and from the lens to the source, respectively. As it should be apparent from Eq. (461, 
the matrix A = A(qgai, qonv) qcos) depends only the smooth mass component of the lens, its environment, and the 
cosmological model. Substituting Eq. (45) into Eq. (44), we can then perform the tL integration 


-P(t|q,qsub) = |A| y ^^Q(‘)^*)(kL|q,qsub)e 


(48) 


where |A| stands for the determinant of the matrix A. We can finally substitute the above into Eq. ([^ to compute 
the data likelihood in the presence of a distributed population of substructures 


^(d|q, qsub) 


lA 


dt 


dkL 


^Q^i^*)(kL|q,qsub)e 


-i(t-t)T A'^kL g-i (t-d)'TCd(t-d) 




(49) 
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^low/^high 

FIG. 7. Dependence of the product {Nd){m^) on the Miow/A7high for four values of the slope of the mass function. In 
each case, we choose the normalization of the snbhalo mass function ao such that all curves asymptote to the same value as 
A7iow/Mhigh —>■ 0. Our choice of normalization corresponds to ao = 3.8 x 10 “^°Mq^ at Mo = Mhigh = 10^^Mq when P = —1.9. 

where we have assumed a Gaussian likelihood, .5f(d|t) oc exp [—|(t — d)'^Cd~^(t — d)], where Cj is the data covari¬ 
ance matrix. Thus, the data likelihood is given by the Fourier transform of the product of (kL|q, qsub) with 

the Fourier conjugate distribution of .if(d|t) evaluated at the residuals d — t between the data and the predictions 
from the smooth lens model. This makes sense: The Fl modes contributing most to the integral are those that 
can explain the residuals between the actual data and the smooth lens model, and the characteristic function 
encodes whether these modes are likely to contribute to the residuals given the input substructure properties. 


B. Gaussian case 


We now specialize to the case where (kulq, qsub) is well approximated by a multivariate Gaussian. Starting 
from Eq. (491, we have 


.:^(d|q,q,,b) = |A| J 


„-i(d-t)TAT(C,„b-|-ACdA'i')-iA(d-t) 


v/(2^)'|Qub + ACdAT| 

thus leading to a Gaussian data likelihood with total inverse covariance matrix given by 

= AT(Csub + ACdAT)-iA. 


(50) 

(51) 


This analysis shows that the effect of distributed unresolved mass substructures can be thought of as an additional 
source of noise that directly contributes to modeling uncertainty. This extra contribution to the net covariance matrix 
entering the likelihood describes the inherent mass modeling uncertainties due to the lumpiness of massive galaxies 
acting as strong gravitational lenses. Whether the inherent mass modeling uncertainties caused by mass substructures 
are relevant or not in the above likelihood depend on the observational precision of a given data set. Gonversely, in the 
event that precise time delay and astrometric observations are available, they can be used with the above likelihood 
to constrain the physical quantities entering the covariance matrix Cgub- 


V. DISCUSSION 


The analytical approximation developed in Sec. |IIIB| allows us to not only marginalize over the masses and positions 
of distributed substructures but also to understand which of their physical properties are most relevant to gravitational 
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lensing observables. For the physically relevant parameter space, the effect of distributed mass substructures on the 
lensing potential and its first derivative is approximately Gaussian, which implies that most of the relevant physics 
is encoded in the covariance matrix Cgub- Using Eqs. (21) and (Cl), this covariance matrix can be decomposed as 
follow 


"^sub 


= {Nd){m^){OlOi) = (Ksub(rref)) 


jm^) {OjOi) 

( to ) Vrivref)’ 


(52) 


where r^ef is a reference radius where the amplitude of the convergence in mass substructures is set (taken to be the 
Einstein radius of the main lens in earlier sections of this paper), and where 


(OlOi) = I d^Vrir, 9)0101 


(53) 


In going from the first to the second equality in Eq. (52), we use Eqs. (40) and (41) to express (N^) in terms of 
Ksub- We give in Appendix useful expressions for the different entries of Csub- The expected number of distributed 
substructures is given by Eq. ([40|), which for /3 < — 1 and Miow “C Mhigh is approximately given by 




(54) 


The second moment of the mass function (to^) is easily computed from Eq. (20), 


1 /3 + 


1 


1 p + I 


/S+3 

high 


P + 3 - Mtt} 


■^low 






for Miow < Mhigh, 


(55) 


where the last approximation is valid for when —3 < /3 < — 1. The second moment of the mass function thus has a 
rather strong dependence on the minimal subhalo mass. Now, if we look at the product {Nd){mP), we immediately 
see that the leading dependence on Miow cancels out for the physically relevant case —3 < /? < — 1 and Miow 'C Mhigh 


(7Vd)(TO2) = 


1 


Oq 


’^"^2/3 +3 


<th-C 


(1 - Vr{< Rmin)) 


1 




Oq 

f+3 


M^+3 


,)). (56) 


This shows that the scaling of the substructure covariance matrix depends mostly on the normalization of the mass 
function oq and on the largest subhalo mass Mhigh. We illustrate this behavior in Fig. I^for different values of the 
mass function slope /?. In each case, we choose the mass function normalization such that all curves asymptote to 
the same value as Miow/Mhigh —5^ 0. In the regime where Gaussianity holds (Miow/Mhigh ^ 0.1), we observe that 
{Nd){rrP) is roughly constant as Miow/Mhigh is varied. Measurement of this constant could provide a consistency 
test for standard cold dark matter theory. Since there are strong degeneracies between the different parameters in 
Eq. (56), the extraction of individual mass function parameters would require strong priors from either simulations 


or complementary observations. 

In addition to its dependence on the subhalo mass function, the covariance matrix Cgub also encodes important 
information about the spatial distribution of distributed mass substructures. Since each entry of the covariance 
matrix depends on the spatial distribution through a different kernel [Eq. (53)], it is reasonable to believe that 
lensing observables will provide good sensitivity to the spatial distribution of substructures. A detailed analysis of 
the sensitivity of different lensing observations to unresolved substructures will be carried in an upcoming work. 
Looking ahead, we expect that combining magnification information (mostly sensitive to local substructures), with 
astrometric fluctuations (sensitive to both local and distributed substructures) and time delay perturbations (sensitive 
to distributed substructures with some local sensitivity) will lead to a comprehensive picture of the satellite populations 
of distant lens galaxies. 


VI. CONCLUSION 

In this paper, we have computed from first principles the probability distribution of lensing potentials and deflections 
in the presence of an unresolved population of mass substructures that are located beyond the strong lensing region. 
We have determined that for a realistic substructure population, the distribution of lensing potential and deflection 
perturbations is close to a multivariate Gaussian. We have computed the leading order deviations from Gaussianity 
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and used them to determine when the probability distribution ceases to be well approximated by Gaussian statistics. 
We have shown in Sec. |IV| how our technique can be used to efficiently marginalize over the properties of distributed 
mass substructures without having to perform costly numerical simulations of mass substructure populations. 

For simplicity, we have treated distributed substructures as independent point masses, which we believe is an 
excellent approximation for subhalos far away from lensed images. We note that our approach can straightforwardly 
be generalized to clustered and extended substructures if we compute the theory covariance matrix as 

^sub J J d^r'K,{r)Kj{r'){Ksuh{r)Ksnh{Y')), (57) 


where the kernels Ki and Kj depend on which linear leasing quantities are being used. We note that in the point-mass 
limit the above expression reduces to Eq. (52). Therefore, we can see that, in the general case, we are really probing 
the ensemble-averaged two-point correlation function of the distributed substructure convergence field. This two-point 
function, which is in general neither homogeneous nor isotropic, can be directly measured in A^-body or semi-analytic 
simulations, hence providing a way to assess the importance of subhalo clustering (the two-halo term) and to test the 
accuracy of the point-mass approximation. We leave such tests to future work. 

In the present manuscript, we have focused our attention on time delay and astrometry perturbations since these 
are the lensing observables that are most sensitive to distributed mass substructures. Expanding our analysis to 
include mass substructures near lensed images would allow the incorporation of magnification information into our 
framework. Together, the relative flux measurements, positions, and time delays between lensed images have power 
to constrain both the local and distributed substructure populations of a lens galaxy, given appropriate levels of 
measurement precision. Quantifying these precision levels in detail will vary from system to system, and will be the 
subject of future work. A s an example, in the case of time delays, the fluctuations caused by distributed substructures 
are demonstrated to be < 1 day [H], suggesting time delay precision levels on the order of hours. 

One of the main advantages of having an analytical framework to handle mass substructures is that it allows 
efficient exploration of degeneracies between substructure effects (qsub), on the one hand, and the macrolens (qgai), 
its environment (qenv) and possible line-of-sight structures (qios) on the other. Exploring and marginalizing over these 
degeneracies is important in assessing the sensitivity of current and future data to the detection of a population of 
nonluminous mass substructures in the outskirts of distant galaxies. Such a detection would confirm a key prediction, 
or offer a quantitative challenge, of our current paradigm for structure formation based on the CDM scenario. The 
synthesis of all lensing observables which are sensitive to different combinations of local and distant substructures, 
measured with sufficient precision, have the potential to produce a complete picture of the substructure mass function. 
The stochastic millilensing framework developed here is a necessary step toward this goal. 
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Appendix A: Subhalo Scale and Truncation Radii 
1. Scale radius 


In this appendix, we derive a relation between the total mass Mguh of a subhalo and its scale radius r^. Our starting 
point is the relation between the maximum circular velocity inside a subhalo Umax and the radius r^ax at which this 
velocity occurs |124j 


1 kpc 


= (0.72 ±0.25) ( 


10 km s" 


1.47 


(Al) 
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By the virial theorem, we also have 


^max) 


= V„ 


(A2) 


where G is the gravitational constant and Msub(< Cmax) is the subhalo mass enclosed within rmax, which for a 
smoothly truncated NFW density profile given in Eq. ( |T^ is 

4(1 + a;max)rarctan(xi„ax/r) - 2a;inax(l + + (1 + a:max)(T^ - 1) In (i+^max) 

^suh{^ ^max) — Afgub 


(x 




2(1+ a;max) ((t^ - l)lnT + 7rr-(r2 + 1)) ’ 

where Mg^h is the total mass of the subhalo given in Eq . ( [T4| , r = rt/vg (where is the tidal radius), and Xmax = 
T'lna.x/i^s- We Substitute the above expression into Eq. (A2) and maximize the left-hand side to find the radius at which 
the maximum circular velocity occurs. The resulting equation is not solvable analytically, but for realistic values of 
T, we obtain 

2 


~ 2.1626 1 - 


1 


1 -f 


for T > 7. 


(A4) 


We then substitute the above into Eqs. (|A3|) and (|A2|) and use Eq. ([Al| to eliminate Umax from Eq. (|A2|) in order to 

ass of th( 

f Mg^b 


obtain a relation between the scale radius Vg and the total mass of the subhalo Mgub 

0.735 

= 1.0 ±0.3 


1 kpc 


V109 Mq 


(A5) 


where we have neglected a weak dependence on r since it leads to changes that are smaller than the scatter about 
the mean. 


2. Tidal truncation radins 


We use the result of Refs. [mi [US] for the tidal truncation radius of a subhalo of mass M^ub located at a 
distance tsd from the main halo center. 




1/3 


V[2 - dlnMmain/dlnr3D]Mi„ain(< C3 d)/ 

where Mmain is the mass of the main lens halo and where both rt and r 3 D are radii in three-dimensional space (not 
projected). Since the truncation radius is most relevant to the stochastic leasing signal for subhalos lying close or 
within the Einstein radius of the lens, it is sufficient to specify the mass distribution in the vicinity of the leased 
images. It this work, we focus on power-law mass models which have been shown to provide good fits to many 
gravitational lenses. The projected mass density divided by the critical density for leasing for these models is given 
by [T32] 




ir) = U^- 

mainV' J o I 

2 \ r 


(ct^inain 7^ 2) 


(A7) 


where 6 is a length scale closely related to the Einstein radius of the main lens and amain is the power-law index of 
the density profile. We can deproject this relation to obtain the 3-d mass profile of the main lens 


/7main(7’3D) — 
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T 


' 3-a„ 


2y/TTb r (A-^aain) 


b 

7-3D 


3-a„ 


(A8) 


where r(a;) is the gamma function. This relation is easily integrated to obtain Mmain(< "Csd)) which leads a tidal 
truncation radius given by 


n = 


r( 2 


1/3 


2 - an 


'3-a„ 


2v/7fScrit&^ 
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rsB 


We are generally interested in quasi-isothermal inner density profiles (amai 
that we generally have 

,,1/3 2/3 

oc 

for a subhalo of mass Mgub located at a distance tsd from the center of the main lens halo. 


^3D; (o^main ^ 2). (A9) 

' 1) for the main lens halo, which implies 
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Appendix B: Spatial moments of the substructure distribution 


In this appendix, we simplify the structure of the spatial integral appearing in Eq. (33) using a multinomial 
expansion. Keeping only the first iVmuit multipoles in Eq. (291, the spatial integral takes the form 

f C / ^mult 


1 r. 


lUa 


d^r7^^(r,6l;qsub)(kL • C^l)” = / d'^r'Pr(r,0]qsnh)\ “ X! ^ El • Ap cos (pS) + El • Sp sin (p6») 






= (-!)"/ (frVr{r,e]qsnh)\ X! f ^ ) 


-^mult r 


n 


• At cos {te) 


1 Pt -^mult 


-^mult 


n 

S = 1 


—kL • -Ss sin (s 


lP^mnlt+« 


||Dll=n t=l 


P^mult+o 


(Bl) 


where p = {pi,P 2 ) • ■ • )P 2 Ar„uit} is a multi-index with ||p|j = Ph where the kernel K-p is given by 


r /I \ N N It 

Kp= d^rVrir,9-qsuh) l-j cos{t9f* sin (s 


(B2) 


and where 


Pl!p2! ■ • ■P27V„„,t! 


(B3) 


is the multinomial coefficient. 


Appendix C: Edgeworth expansion of the characteristic function 


In this appendix, we perform the Edgeworth expansion of the characteristic function in powers of (Nd) 

We start by writing the Cholesky decomposition of the covariance matrix Cgub as 


Csub = (Ad)(m2)AAT, 


(Cl) 


which is always possible since Cgub is a symmetric positive-definite matrix, and where we have pulled out the overall 
scaling with the average number of mass substructures and the second moment of the mass function. We can then 
define a new normalized Fourier conjugate variable kn as 


kb = V(^d)(w^)A'^kL 


(C2) 


and express the characteristic function as a function of it. Since the characteristic function (^l) of the normalized 
variable kn is related to that given in Eq. (361 by 


= ^ /ndist ( (A'^) ^kb 

V(iVd)(m2)|Ar<'^^> [^{Nd){m^) 


O^Mkb) = 


(C3) 


we obtain 
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where we have taken the mean vector u = (AtL) to vanish, but the above result can straightforwardly be generalized 
to a nonzero mean values of linear lensing quantities. To make the notation more compact, we define 


^mult 

((*kL.dL)")= 5] 


P^mult+o 


(C5) 


We then expand the exponential in Eq. (C4) to obtain the proper Edgeworth expansion of 

jkL'kL / 

QfNTi^i^)= 
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(m3)((IkL • Ol)^) 
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1296(m2)9/2 
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Appendix D: Covariance Matrix for Linear Lensing Quantities 
1. General expressions in the presence of circular symmetry 


In general, the covariance matrix for linear lensing quantities is given by 


C:^b = (A^d)(m2) dWrir^e) 


.P=l 


cos{p9) + B^’-> sinipO) 


1 r 






Ap^cos (t6») + Bp^sin {tO) 


(Dl) 

where Ap ^ denotes the zth component of the vector A"p. In the case of a circular halo with Vr{r, 0) = Vr(r), the above 
expression dramatically simplifies. In the following, we provide convenient expressions for the different entries of the 
covariance matrix for the linear lensing quantities. We take the position of image i to be x^ = (r^ cos 0^, sin 0^), 
that of image j to be Xj = (r^ cos , r^-sin 0^), and the reference point for the projected lensing potential is x^f = 

(rref cos 0ref, J'ref sin ^ref)- Wc usc the notation to denote the difference in projected potential between image i 
and the reference point, that is, = (/'sub(xi) — (^sub(xref)- The covariances for the deflections are given by 
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■^sub,ic^sub,y/ 


(Nd) {m^)'^r^r^JC[2{p + 1)] sin{p(0, - Oj)}, 

p^l 


(D3) 


where the kernel lC[n] is given by 


K,[n] = ' 


drrVr{r) — 


(D4) 


' R„ 


where the leading factor of tt comes from the angular integration over 9. We give explicit expressions for K,[n] in the 
next subsection for two choices of spatial distributions. The cross terms between deflections and lensing potential are 


= (A^d) (to^) ^ (r^ cos [(p - l)9i - p9j\ - cos [(p - 1)9, - p6»,ef]), 
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«lub,y</'iub) = (^d) (to^) (-’’j sin [{p - 1)0* - p9j] + f sin [(p - 1)0^ - p9,ef]) ■ (D 6 ) 


P =1 


Finally, the covariance between projected lensing potential is given by 

= (^d)(m2) cos [p(0* - 0j)] + - rf cos [p(0i - 0ref)] - cos [p(0j - 0ref)])) • (D7) 

p=i P 


2. K. kernel for different choices of spatial distributions 


We now provide explicit expressions for the spatial kernel given in Eq. (D41 above. 


a. Power-law spatial distribution 


We consider the following power-law spatial distribution 

^r )-2 


Vrir) = 


ijr • 




(0<77<2), 


(D 8 ) 


where the case p = 1 corresponds to an isothermal profile, while p = 2 reduces to the case of a uniform spatial 
distribution. For this distribution, the kernel takes the form 




if n^p, 


2(it!l,ax 


In if n = 77 . 


(D9) 


b. Cored spatial distribution 


In this case, the spatial distribution of substructure is given by 

7 . M _ ^j^ _ 

W^(i?max/Cc) - WiR^iJr,)] (1 + (r/re))2 
and where Cc is the core radius. The kernel is then given by 
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where 2 Fi{a^b]C]x) is the ordinary (Gaussian) hypergeometric function. 
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Appendix E: Tables of used symbols 


Symbol 

Description 

a 

Lensing deflection vector 

(^main 

Slope of the main lens galaxy projected density profile 

O^sub 

Deflection vector caused by substructures 

(0 

Q: u 
sub,a: 

x-component of the net deflection caused by substructures at the ith image 

P 

Slope of the subhalo mass function 

7c, 7s 

Lensing shear components 

O^sub 

Magnitude of shear caused by substructures 


fc-dimensional Dirac delta function 

K 

Lensing convergence 

^sub 

Lensing convergence in mass substructures 

A 

Upper triangular matrix from the Cholesky decomposition of the covariance matrix 


Magnification tensor for the smooth lens component, evaluated at position x 

n(q) 

Prior probability distribution of q 

P(r) 

Three-dimensional density profile 

^crit 

Critical density for gravitational lensing 

T 

= n/n 

<!> 

Projected gravitational potential 

(f^suh 

Substructure contribution to (j> 

Tsuh 

Projected gravitational potential difference between the ith image and the reference point 

$i(x|q) 

PDF of X given q, where x is a single independent random variable 

$iv(x|q) 

PDF of X given q, where x is the sum of N independent random variables 


TABLE II. Summary Greek symbols used throughout the manuscript. 


Symbol 

Description 

ao 

Normalization of the subhalo mass function 

A 

Transformation matrix between the linear lensing quantities and the lensing observables 

-Ap, Bp 

Vectors of pth-order multipole coefficients 

b 

Approximate Einstein radius of main lens galaxy 

Csub 

Vector containing the individual substructure parameters 

Sub 

Substructure parameters for a single mass clump 

Csub 

Covariance matrix for the linear lensing quantities 

Cd 

Data covariance matrix 

d 

Data vector 

Di, Ds, Dis 

Angular diameter distances to the lens, to the source, and from the lens to the source 

h 

Projection of tsd along the line of sight 

Hd 

Area occupied by the distributed substructures 

kn 

Fourier variable conjugate to Atr 


Spatial kernel for multipole expansion 

K.[n] 

nth-order spatial kernel for the multipole expansion of the substructure covariance matrix 

.5f(x|d) 

Likelihood of theory vector x given data vector d 

m 

Normalized substructure mass = Msub/(7rScrit) 

Mo 

Reference mass for subhalo mass function 


Mass of main lens galaxy 


TABLE III. Summary of roman and scripted symbols used throughout the manuscript. 
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Symbol 

Description 

Mmin 

Low-mass bound of the subhalo mass function 

Mmax 

High-mass bound of the subhalo mass function 


Mass of main lens halo 

Mnfw 

Mass normalization of the NEW prohle 

Msub 

Substructure mass 

N, {N) 

Number of mass substructure, average number of mass substructures 

Ni, Vd 

Number of local and distributed substructures 

Mmg 

Number of lensed images 

Vmult 

Maximum number of multipole included in the expansion 

Ol 

Vector of stochastic random variables (= AtL/m.) 

P 

Multi-index (vector of multiple indices) 

P(x|q) 

PDF of vector x given parameter vector q 

Psuh 

PDF for the mass and position of substructures 

VlniMsub) 

PDF for the substructure mass 

Vr{r) 

PDF for the spatial distribution of substructures 

Qcos? QDM 

Cosmological parameters, Dark matter parameters 

Qenv t Qlos 

Lens environment parameters, Line-of-sight structure parameters 

qgal 

Macrolens parameters 

Qsub 

Substructure population parameters 

q 

= {<lgal, <lenv, QIgs} 

ai(k|q) 

Characteristic function of <I>i(x|q) 

_local ^dist 

ii ) ?i 

Local and distributed contribution to qi 

QAr(k|q) 

Characteristic function of 'l>]v(x|q) 

/olocal y^dist 

Wn ■> Wn 

Local and distributed contribution to Qjv 

r,e 

Two-dimensional polar coordinates 

rsD 

Three-dimensional position of subhalo within the lens galaxy 

Tc 

Core radius of the substructures’ spatial distribution 

'^max 

Radius where Umax occurs 

^ ref 

Reference radius where zero of projected potential is defined 

Ta 

Scale radius of subhalo 

^sub 

Three-dimensional distance from center of subhalo 

rt 

Tidal radius of subhalo 

-^ein 

Einstein radius of the lens 

-^min 

Minimum radius of the distributed substructure population 

-^max 

Maximum radius of the distributed substructure population 

-Rvir 

Virial radius of the main lens galaxy 

t 

Vector of theory observables 

t 

Contribution to t from smooth mass component and environment 

tb 

Vector of linear leasing quantities 

tb 

Contribution to tn from smooth mass component and environment 


Contribution to tn from the ith mass substructure 

Atr 


Atjocal^ At)*'"* 

Local and distributed contributions to Atn 

At(‘) 

Arrival time delay between image i and the leading image 

u 

= (Atn) 

"^^max 

Maximum circular velocity of a dark matter halo 

Xi 

Unperturbed (from smooth model only) position of the image 

x(b 

Actual position of i**' lensed image 

^max 

= '^max/^s 

'2'lens 

Redshift of the main lens 


TABLE IV. Summary of roman and scripted symbols used throughout the manuscript (continued). 
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